| Table 1: Flight Delay Summary by Flight Period | ||||
|---|---|---|---|---|
| Flight Period |
Flight Period
|
|||
| Morning | Afternoon | Evening | Total | |
| TotalFlightsCount | 1246031 (41.5%) | 1423140 (47.4%) | 330829 (11.0%) | 3000000 (100%) |
| CancelledFlightsCount | 30690 (38.8%) | 38343 (48.4%) | 10107 (12.8%) | 79140 (100%) |
| DivertedFlightsCount | 2555 (36.2%) | 3901 (55.3%) | 600 (8.5%) | 7056 (100%) |
| AvgCRSDepTime | 08:49:31 | 15:73:19 | 20:66:23 | 13:27:04 |
| AvgDepTime | 08:53:58 | 15:89:05 | 20:12:40 | 13:29:47 |
| AvgDepDelay | 5.23 | 12.93 | 16.51 | 10.12 |
| AvgTaxiOut | 16.87 | 16.44 | 16.65 | 16.64 |
| AvgTaxiIn | 7.75 | 7.78 | 6.95 | 7.68 |
| AvgCRSArrTime | 10:87:15 | 17:85:11 | 17:42:14 | 14:90:34 |
| AvgArrTime | 10:86:01 | 17:71:56 | 15:89:47 | 14:66:31 |
| AvgArrDelay | -0.77 | 7.34 | 10.04 | 4.26 |
| AvgAirTime | 114.12 | 109.8 | 116.31 | 112.31 |
| CarrierDelayCount | 86824 (29.2%) | 162266 (54.6%) | 47861 (16.1%) | 296951 (100%) |
| SecurityDelayCount | 887 (32.1%) | 1434 (52.0%) | 438 (15.9%) | 2759 (100%) |
| WeatherDelayCount | 8380 (26.7%) | 18758 (59.7%) | 4290 (13.7%) | 31428 (100%) |
| NASDelayCount | 80604 (31.4%) | 144366 (56.3%) | 31507 (12.3%) | 256477 (100%) |
| LateAircraftDelayCount | 42721 (16.5%) | 168902 (65.2%) | 47391 (18.3%) | 259014 (100%) |
| Summary includes morning, afternoon, and evening flight periods. | ||||
Bayesian Linear Regression
Analysis of Flight Delay Data
Slides: slides.html
1 Introduction
Bayesian inference was initially developed by Reverend Thomas Bayes, but his ruminations on inverse probability wouldnβt be known until a friend polished and submitted his work to the Royal Society. Bayesβ work was eventually developed and refined by Laplace. Bayesian inference was wildly different from Fisherβs work in defining classical statistical theory (Lesaffre and Lawson 2012).
In opposition to the Bayesian approach is the frequentist approach. The frequentist approach considers the parameter of interest fixed and inference on the parameter is based on the result of repeated sampling. In the Bayesian approach, the parameter of interest is not fixed but stochastic, and inference on the parameter is based on observed data and prior knowledge (Lesaffre and Lawson 2012).
A benefit of the Bayesian approach lies in the ability to include prior knowledge through the selection of a prior. Priors can be subjective or objective. Subjective priors incorporate opinions, of an individual or of a group, which can negatively impact the perceived integrity of the findings. Objective priors are preferred which follow formal rules for determining noninformative priors (Lesaffre and Lawson 2012).
When prior knowledge is lackluster, has little information, or is otherwise not sufficient, a noninformative prior may be used. A common choice for a noninformative prior, in cases with binomial likelihood, is a uniform distribution, also known as a flat prior. In cases with a Gaussian likelihood, the noninformative prior would be a normal prior with \(\sigma^2 \to \infty\) which functions similarly to the flat prior. For cases with a Poisson likelihood, a Gamma(\(\alpha_0\), \(\beta_0\)) prior is used where the sum of the counts are \((\alpha_0 - 1)\) and the experiment size is \(\beta_0\) (Lesaffre and Lawson 2012). For normal linear regression models, conjugate normal-gamma priors can be used to provide a resulting posterior that is of the same family(Yan and Su 2009).
There are a variety of ways to summarize the posterior in order to derive conclusions about the parameter. Its location and variability can be specified by finding the mode, mean, and median of the distribution. Its range can be defined with the equal tail credible interval (not to be confused with the confidence interval in the frequentist approach) or with the high posterior density (HDP) interval. Future predictions for the parameter can be made through posterior predictive distributions (PPD) which factor out \(\theta\) with the assumption that past data will not influence future data, hierarchical independence(Lesaffre and Lawson 2012).
It is not uncommon for the marginal posterior density function of a parameter to be unavailable, requiring an alternate approach to extract insight. It is safe to assert that sampling techniques are used in nearly all Bayesian analyses(Yan and Su 2009). General purpose sampling algorithms available include the inverse ICDF method, the accept-reject algorithm, importance sampling, and the commonly used Monte Carlo integration. The Monte Carlo integration replaces the integral of the posterior with the average obtained from randomly sampled values to provide an expected value. Two popular Markov chain Monte Carlo (MCMC) methods are the Gibbs sampler and the Metropolis-Hastings (MH) algorithm(Lesaffre and Lawson 2012).
2 Methods
2.1 The Frequentist Framework
Linear regression can be achieved using a variety of methods, two of interest are frequentist and Bayesian. The frequentist approach to linear regression is the more familiar approach. It estimates the effects of independent variables(predictors) on dependent variables(the outcome). The regression coefficient is a point estimate, assumed to be a fixed value. Following is the frequentist linear model
\[ Y = \beta_0 + \beta_1X + \varepsilon \]
- \(Y\) : Dependent variable, the outcome
- \(\beta_0\) : y intercept
- \(\beta_1\) : The regression coefficient
- \(X\) : Independent variable
- \(\varepsilon\) : Random error (Yan and Su 2009)
- \(\hat\beta\) provides a point estimate
2.2 The Bayesian Framework
The Bayesian approach estimates the relationship between predictors and an outcome in a similar way, however itβs regression coefficient is not a point estimate, but a distribution. That is, the regression coefficient is not assumed to be a fixed value. The Bayesian approach also goes a step further then frequentist regression in itβs inclusion of prior data. The Bayesian approach is so named because it is based on Bayesβ rule which is written as follows:
\[ Posterior = \frac{Likelihood \times Prior}{Normalization} \]
- The \(Prior\) is model of prior knowledge on the subject
- The \(Likelihood\) is the probability of the data given the prior
- The \(Normalization\) is a constant that ensures the posterior distribution is a valid density function whose integration is equal to 1
- The \(Posterior\) is the probability model that expresses an updated view of the model parameters
- From the initial parameters of the prior
In terms of calculating probability, Bayesβ rule can be written as
\[ p(B|A) = \frac{p(A|B)\cdot p(B)}{p(A)} \]
- Bayesβ rule allows for the calculation of inverse probability (\(p(B|A) \text{ from } p(A|B)\))
- \(p(B|A) \text{ and } p(A|B)\) are conditional probabilities
- \(p(A) \text{ and } p(B)\) are marginal probabilities (Lesaffre and Lawson 2012)
For calculating the probability of continuous parameters, Bayes rule can be applied as
\[ \begin{align*} p(\theta|y) =& \frac{ L(\theta|y)p(\theta) }{p(y)}\\ \\ p(\theta|y) \propto & \text{ }L(\theta|y)p(\theta) \end{align*} \]
The normalization constant (\(p(y)\) above) ensures the posterior distribution is a valid distribution, but the posterior density function can be written without this constant. The resulting prediction is not a point estimate, but a distribution (Bayes1991?). The Bayesian approach is derived with Bayesβ theorem wherein the posterior distribution, the updated belief about the parameter given the data \(p(\theta|y)\), is proportional to the likelihood of \(\theta\) given \(y\), \(L(\theta|y)\), and the prior density of \(\theta\), \(p(\theta)\). The former is known as the likelihood function and would comprise the new data for analysis while the latter allows for the incorporation of prior knowledge regarding \(\theta\)(Yan and Su 2009).
\[ f(\theta|D) \propto L(\theta|D)f(\theta) \]
2.3 The Models
To generate a model for our analysis, we start with the normal data model \(Y_i|\beta_0, \beta_1, \sigma \sim N(\mu, \sigma^2)\) and include the mean specific to our predictor, departure time, \(\mu_i\). The model is
\[ \begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \end{align*} \]
Where:
- \(Y_i\) is the arrival delay for the i-th flight
- \(X_i\) is the departure delay for the i-th flight
- \(\mu_i = \beta_0 + \beta_1X_i\) is the local mean arrival delay, , specific to the departure time
- \(\sigma^2\) is the variance of the errors
- \(\overset{\text{ind}}{\sim}\) indicates conditional independence of each arrival delay with the given parameters
2.4 Prior Selection
Since we are only using two continuous variables for the first model, arrival delay and departure time, the regression parameters will be \(\beta_0\), \(\beta_1\), and \(\sigma\) for intercept, slope, and error As intercept and slope regression parameters can take any real value, we will use normal prior models (Johnson, Ott, and Dogucu 2022).
\[ \begin{align*} \beta_0 &\sim N(m_0, s^2_0)\\ \beta_1 &\sim N(m_1, s^2_1) \end{align*} \]
where \(m_0, s_0, m_1, \text{and } s_1\) are hyperparameters.
The standard deviation parameter must be positive, so we will use an exponential model (Johnson, Ott, and Dogucu 2022).
\[ \sigma \sim \text{Exp}(l) \]
Due to the fact that the exponential model is a special case of the Gamma model, with \(s = 1\), we can use the definitions of the mean and variance of the gamma model to to find that of the exponential model (Johnson, Ott, and Dogucu 2022).
\[ \begin{align*} E(\sigma) = \frac{1}{l} && \text{and} && SD(\sigma) = \frac{1}{l} \end{align*} \]
2.5 Tuning of Priors
Flat vs default vs tuned priors
2.6 The Bayesian Linear Regression Model
The Normal-normal regression model with a continuous predictor can be written as
\[ \begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \\ \beta_{0} &\sim N(m_0, s_0^2)\\ \beta_1 &\sim N(m_1, s_1^2)\\ \sigma &\sim \text{Exp}(l) \end{align*} \]
2.7 Assumptions
βNormal regression assumptions
The appropriateness of the Bayesian Normal regression model (9.3) depends upon the following assumptions.
- Structure of the data
- Accounting for predictor \(X\), the observed data \(Y_i\) on case \(i\) is independent of the observed data on any other case \(j\).
- Structure of the relationship
- The typical \(Y\) outcome can be written as a linear function of predictor X, \(ΞΌ = Ξ²_0 + Ξ²_1X\).
- Structure of the variability
- At any value of predictor \(X\), the observed values of \(Y\) will vary normally around their average \(ΞΌ\) with consistent standard deviation \(Ο\).β
2.8 Statistical Programming
Data was collected by the Bureau of Transportation Statistics (BTS) and accessed through a dataset compiled by Patrick Zelazko (Zelazko 2023). The data was imported into R (R Core Team 2023) via CSV. This is a large time-series dataset with with 3 million observations, each a specific flight, and 32 features. The data is from flights within the United States from 2019 through 2023. Diverted and cancelled flights are recorded, as are the time in minutes and attributed reasons for delay. The function stan_glm() was used for simulation of the Normal Bayesian linear regression model from the βrstanarmβ library(Brilleman et al. 2018). This function runs the Markov Chain Monte Carlo simulation as well with specified chains, iterations, and the ability to set a seed. These were set to 4 chains, 2000 iterations, and the seed was set to 123. Simulation of the posterior was done with the posterior_predict() function, also from the βrstanarmβ library(Brilleman et al. 2018). Evaluation of the model was done by considering the data and itβs source, the assumptions of the model, and the accuracy of the prediction. The posterior predictions were evaluated with the prediction_summary() function from the βbayesrulesβlibrary (Dogucu, Johnson, and Ott 2021). This provided median absolute error (MAE) scaled MAE, and the proportion of values that fall within 50% and 95% confidence intervals.
Possibly k-fold cross validation, model averaging
3 Analysis
3.1 The Dataset
This is a large dataset with with 3 million observations, each a specific flight, and 32 features. The data is from flights within the United States from 2019 through 2023. Diverted and cancelled flights are recorded, as are the time in minuted and attributed reasons for delay. Following are the definitions of the given variables in this dataset.
| Header | Description |
|---|---|
| Fl Date | Flight Date (yyyy-mm-dd) |
| Airline | Airline Name |
| Airline DOT | Airline Name and Unique Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. |
| Airline Code | Unique Carrier Code |
| DOT Code | An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. |
| Fl Number | Flight Number |
| Origin | Origin Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. |
| Origin City | Origin City Name, State Code |
| Dest | Destination Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. |
| Dest City | Destination City Name, State Code |
| CRS Dep Time | CRS Departure Time (local time: hhmm) |
| Dep Time | Actual Departure Time (local time: hhmm) |
| Dep Delay | Difference in minutes between scheduled and actual departure time. Early departures show negative numbers. |
| Taxi Out | Taxi Out Time, in Minutes |
| Wheels Off | Wheels Off Time (local time: hhmm) |
| Wheels On | Wheels On Time (local time: hhmm) |
| Taxi In | Taxi In Time, in Minutes |
| CRS Arr Time | CRS Arrival Time (local time: hhmm) |
| Arr Time | Actual Arrival Time (local time: hhmm) |
| Arr Delay | Difference in minutes between scheduled and actual arrival time. Early arrivals show negative numbers. |
| Cancelled | Cancelled Flight Indicator (1=Yes) |
| Cancellation Code | Specifies The Reason For Cancellation |
| Diverted | Diverted Flight Indicator (1=Yes) |
| CRS Elapsed Time | CRS Elapsed Time of Flight, in Minutes |
| Actual Elapsed Time | Elapsed Time of Flight, in Minutes |
| Air Time | Flight Time, in Minutes |
| Distance | Distance between airports (miles) |
| Carrier Delay | Carrier Delay, in Minutes |
| Weather Delay | Weather Delay, in Minutes |
| NAS Delay | National Air System Delay, in Minutes |
| Security Delay | Security Delay, in Minutes |
| Late Aircraft Delay | Late Aircraft Delay, in Minutes |
Table 1 shows the flights distinguished by a new variable, Flight Period, where each time period is comprised of an 8-hour segment (i.e.Β Morning has flights with departure times from 4am to noon followed by afternoon and evening). The Afternoon period is comprised of the most flights (47.4%), followed closely by the Morning period (41.5%), and the Evening period trails the two (11%). The table also gives the means of the departure and arrival times, giving an indication of the density of the flights in the given period. The average departure and arrival delays show much better numbers for the Morning period (5.23, -0.77 minutes) with increasing delays for the Afternoon and Evening periods. The delay counts by type show That the Afternoon and Morning periods account for significantly more of the total delays, though that is without taking into account the smaller contribution of flights by the Evening period on the whole.
3.2 Exploratory Data Analysis
The histograms in Figure 1 illustrate the frequencies of air time, arrival delays, and departure delays. The y-axis was transformed to make the visualizations more legible. All show a skew to the right. This makes sense for air times with a higher proportion of regional flights and the exclusion of international departures and arrivals. Shorter delays (for both arrivals and departures) being more frequent than longer delays is also to be expected.
Figure 2 shows the average arrival delay for the largest five airlines (filtered for carriers with over 200,000 flights in the given period). The standard deviations for these airlines are fairly small, indicating a low variability in the arrival delays for these airlines.
Figure 3 displays the average arrival delay for flights at their origin airport using βplotlyβ (Sievert 2020) and βggmapβ packages(Kahle and Wickham 2013). This comes from the idea that a flightβs arrival delay is collinear with itβs departure delay which may be influenced by itβs origin airport. Airport location information sourced from Random Fractals Inc(Novak 2023).
Figures 4 and 5 show the correlation of the continuous variables and the correlation of the categorical and continuous variables respectively using the βcorrplotβ package(Wei and Simko 2024). Figure 4 doesnβt provide any particularly valuable insight. Arrival delay has an expected positive correlation coefficient with departure delay. There appears to be a slight positive correlation between taxi times and arrival delay and elapsed time which is also expected.
Figure 5 shows the correlation of the continuous and categorical variables using the p-values from the t-tests and ANOVAs of each relation. Relationships with at least one significant p-value are shown in dark blue. This heatmap is of limited value as there appears to be many significant correlations, but without further inspection it is not known which value(s) of the categorical variables have a significant p-value.
Code
p_value_long <- melt(p_value_results, na.rm = TRUE) # Remove NAs
ggplot(data = p_value_long, aes(x = Var1, y = Var2, fill = value)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "blue", high = "red", na.value = "gray90", name = "p-value") +
labs(x = "Categorical Variable",
y = "Continuous Variable",
caption = "Heatmap of p-values for categorical vs continuous variables.",
tag = "Figure 5") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.caption.position = "plot",
plot.caption = element_text(hjust = 0),
plot.tag.position = "topleft")3.3 Data Preprocessing
To prepare the data for analysis, a few changes were made. Diverted and cancelled flights were removed as they both have arrival delays set to βNAβ due to them not arriving as scheduled. Including cancelled and diverted flights would introduce complexity and could lead to bias in the model. The goal of the model is to understand typical arrival delays, so the inclusion of cancelled and diverted flights is unnecessary. The dataset is also quite large with three million observations, so the dataset was sampled to account for computational efficiency. For the first model, the departure time variable was converted from an βHHMMβ format to a βminutes past midnightβ format for use by the stan_glm() function.
Code
ggplot(Delays_sample, aes(x = DEP_TIME)) +
geom_histogram(binwidth = 10, fill = "blue", color = "black") +
labs(caption="Distribution of departure time.",
x = "Departure Time (time HHMM)",
y = "Frequency",
tag = "Figure 6") +
xlim(NA, 2400) +
theme(plot.caption.position = "plot",
plot.caption = element_text(hjust = 0),
plot.tag.position = "topleft")Code
convert_to_minutes <- function(time) {
hour <- time %/% 100
minute <- time %% 100
total_minutes <- hour * 60 + minute
return(total_minutes)
}
Delays_sample <- Delays_sample %>%
mutate("DEP_TIME_MINS" = sapply(DEP_TIME, convert_to_minutes))Code
ggplot(Delays_sample, aes(x = DEP_TIME_MINS)) +
geom_histogram(binwidth = 10, fill = "blue", color = "black") +
labs(caption = "Distribution of adjusted departure time.",
x = "Departure Time (mins past midnight)",
y = "Frequency",
tag = "Figure 7") +
xlim(NA, 1440) +
theme(plot.caption.position = "plot",
plot.caption = element_text(hjust = 0),
plot.tag.position = "topleft")For the second model, βDay of the Weekβ was the chosen predictor variable which was created using the βlubridateβ library(Grolemund and Wickham 2011).
Code
Delays_sample1 <- Delays_sample %>%
mutate(FL_DATE = as.Date(FL_DATE, format = "%Y-%m-%d"))
ggplot(Delays_sample1, aes(x = FL_DATE)) +
geom_histogram(binwidth = 30, fill = "skyblue", color = "black") +
labs(
caption = "Flight counts by date.",
x = "Flight Date",
y = "Count",
tag = "Figure 8") +
theme_minimal() +
theme(plot.caption.position = "plot",
plot.caption = element_text(hjust = 0),
plot.tag.position = "topleft")Code
library(lubridate)
Delays_sample <- Delays_sample %>%
mutate(DAY_OF_WEEK = wday(FL_DATE, label = TRUE, abbr = TRUE))
mean_delay_by_day <- Delays_sample %>%
group_by(DAY_OF_WEEK) %>%
summarise(mean_arr_delay = mean(ARR_DELAY),
sd_arr_delay = sd(ARR_DELAY))Code
ggplot(Delays_sample, aes(x = DAY_OF_WEEK, y = ARR_DELAY)) +
geom_boxplot(outlier.shape = NA) +
geom_point(data = mean_delay_by_day, aes(x = DAY_OF_WEEK, y = mean_arr_delay),
color = "red4", size =3, shape = 8) +
labs(
caption = "Arrival delay by the day of the week.",
x = "Day of the Week",
y = "Arrival Delay (minutes)",
tag = "Figure 9" ) +
theme_minimal()+
ylim(-45,45) +
theme(plot.caption.position = "plot",
plot.caption = element_text(hjust = 0),
plot.tag.position = "topleft")3.4 Modeling
The Normal Data Model: Departure Time Predictor
For the continuous predictor, Departure Time, three normal models with different priors are sampled: - Flat Priors - Rstanarmβs Default Priors - Tuned Priors
\[ \begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \\ \beta_{0} &\sim N(m_0, s_0^2)\\ \beta_1 &\sim N(m_1, s_1^2)\\ \sigma &\sim \text{Exp}(l) \end{align*} \]
Flat Continuous Model
Code
library(broom.mixed)
library(rstanarm)
library(ggpubr)
flat_model_dt <- stan_glm(ARR_DELAY ~ DEP_TIME_MINS,
data = Delays_sample,
family = gaussian(),
prior = NULL,
prior_intercept = NULL,
prior_aux = NULL,
chains = 4, iter = 2000, seed = 123,
refresh = 0
)
fmdt <- tidy(flat_model_dt, effects = c("fixed", "aux"),
conf.int = TRUE, conf.level = 0.95)
# Simulate a set of predictions
set.seed(123)
shortcut_prediction1 <-
posterior_predict(flat_model_dt, newdata = data.frame(DEP_TIME_MINS = 720))
ggarrange(
mcmc_dens(shortcut_prediction1) +
xlab("Delay (minutes)") +
ggtitle("Figure 10 (a) Predicted Arrival Delays for a Departure Time of Noon, Flat Priors")
)Code
model <- flat_model_dt
ggarrange(mcmc_trace(model, size = 0.1) + ggtitle("Figure 10 (b) MCMC Trace") +ylab("pi"))Code
ggarrange(mcmc_dens_overlay(model) + ggtitle("Figure 10 (c) MCMC Overlay") + ylab("density") + xlab("pi"))Code
ggarrange(mcmc_acf(model) + ggtitle("Figure 10 (d) Autocorrelation"))The posterior median relationship for the first model with flat priors is
\[ -10.92 + 0.02X. \]
That is, at \(0 \text{ minutes}\) (i.e.Β Midnight), the expected delay is \(-10.92\). For every minute past midnight, the delay is expected to increase by \(0.02 \text{ minutes}\) (\(1.2 \text{ seconds}\)). At noon(\(X = 720 \text{ minutes}\)), the expected delay for any flight is \(3.48 \text{ minutes}\) (\(\sim 3 \text{ minutes and } 29\text{ seconds}\)).
Default Continuous Model
Code
library(rstanarm)
default_model_dt <- stan_glm(ARR_DELAY ~ DEP_TIME_MINS,
data = Delays_sample,
family = gaussian(),
chains = 4, iter = 2000, seed = 123,
refresh = 0
)
dmdt <- tidy(default_model_dt, effects = c("fixed", "aux"),
conf.int = TRUE, conf.level = 0.95)
# Simulate a set of predictions
set.seed(123)
shortcut_prediction2 <-
posterior_predict(default_model_dt, newdata = data.frame(DEP_TIME_MINS = 720))
ggarrange(
mcmc_dens(shortcut_prediction2) +
xlab("Delay (minutes)") +
ggtitle("Figure 11 (a) Predicted Arrival Delays for a Departure Time of Noon, Default Priors")
)Code
model <- default_model_dt
ggarrange(mcmc_trace(model, size = 0.1) + ggtitle("Figure 11 (b) MCMC Trace") +ylab("pi"))Code
ggarrange(mcmc_dens_overlay(model) + ggtitle("Figure 11 (c) MCMC Overlay") + ylab("density") + xlab("pi"))Code
ggarrange(mcmc_acf(model) + ggtitle("Figure 11 (d) Autocorrelation"))The posterior median relationship for the first model with default priors (tuned via rstanarm package)
\[ -10.94 + 0.02X. \]
This version of the model only had a small decrease in the intercept.
Tuned Continuous Model
\(\beta_0\) informs the model intercept
Code
summary(Delays_sample$DEP_TIME_MINS) #mean departure time is 809.3 minutes (~ 1:30pm)
Delays_sample_filtered_B0 <- subset(Delays_sample, DEP_TIME_MINS >= 800 & DEP_TIME_MINS <= 820)
mean(Delays_sample_filtered_B0$ARR_DELAY) #m_0c = 2
sd(Delays_sample_filtered_B0$ARR_DELAY) #s_0c = 36\(\beta_{0c}\) reflects the typical arrival delay at a typical departure time. With a mean departure time at \(\sim\) 1:30pm, the average arrival delay is \(\sim\) 2 minutes with a standard deviation \(\sim\) 36 minutes.
\[ \beta_{0c} \sim N(2, 36^2) \]
\(\beta_1\) informs the model slope
Code
lm_model <- lm(ARR_DELAY ~ DEP_TIME_MINS, data = Delays_sample)
summary(lm_model)
coef(lm_model)["DEP_TIME_MINS"] #m_1 = 0.01903
summary(lm_model)$coefficients["DEP_TIME_MINS", "Std. Error"] #s_1 = 0.0005The slope of the linear model indicates a 0.019 minute increase in arrival delay per minute increase in departure time, so we set \(m_1 = 0.02\). The standard error reflects high confidence at 0.0005, but as to not limit the model we will set it lower at \(s_1 = 0.01\).
\[ \beta_{1} \sim N(0.02, 0.01^2) \]
\(\sigma\) informs the regression standard deviation
Code
summary(lm_model)$sigmaTo tune the exponential model, we set the expected value of the standard deviation, $ E() $, equal to the residual standard error, \(\sim 50\). With this, we can find the rate parameter, \(l\).
\[ \begin{align*} E(\sigma) &= \frac{1}{l} = 50\\\\ l &= \frac{1}{50} = 0.02\\\\ \sigma &\sim \text{Exp}(0.02) \end{align*} \]
\[ \begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \\ \beta_{0} &\sim N(2, 36^2)\\ \beta_1 &\sim N(0.02, 0.01^2)\\ \sigma &\sim \text{Exp}(0.02) \end{align*} \]
Code
library(rstanarm)
tuned_model_dt <- stan_glm(ARR_DELAY ~ DEP_TIME_MINS,
data = Delays_sample,
family = gaussian(),
prior_intercept = normal(2, 1296),
prior = normal(0.02, 0.0001),
prior_aux = exponential(0.02),
chains = 4, iter = 2000, seed = 123,
refresh = 0
)
library(broom.mixed)
tmdt <- tidy(tuned_model_dt, effects = c("fixed", "aux"),
conf.int = TRUE, conf.level = 0.95)
# Store the 4 chains for each parameter in 1 data frame
model_tuned_df <- as.data.frame(tuned_model_dt)
##TUNED
# Simulate a set of predictions
set.seed(123)
shortcut_prediction <-
posterior_predict(tuned_model_dt, newdata = data.frame(DEP_TIME_MINS = 720))
ggarrange(
mcmc_dens(shortcut_prediction) +
xlab("Delay (minutes)") +
ggtitle("Figure 12 (a) Predicted Arrival Delays for a Departure Time of Noon, Tuned Priors")
)Code
model <- tuned_model_dt
ggarrange(mcmc_trace(model, size = 0.1) + ggtitle("Figure 12 (b) MCMC Trace") +ylab("pi"))Code
ggarrange(mcmc_dens_overlay(model) + ggtitle("Figure 12 (c) MCMC Overlay") + ylab("density") + xlab("pi"))Code
ggarrange(mcmc_acf(model) + ggtitle("Figure 12 (d) Autocorrelation"))The posterior median relationship for the first model with manually tuned priors is
\[ -11.66 + 0.02X. \]
This version of the model had a larger decrease in the intercept (just over 43 seconds), but there was no large change in the slope.
The Normal Data Model: Week Day Predictor
The categorical predictor, Day of the Week, is given a similar treatment to the continuous predictor with the sampling of three normal models with different priors: - Flat Priors - Rstanarmβs Default Priors - Tuned Priors Tuesday is set to be the reference as it has the lowest mean delay.
\[ \begin{align*} Y_i|\beta_0, \beta_1, ... \beta_6, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... \beta_6X_{i6} \\ \beta_{0} &\sim N(m_0, s_0^2)\\ \beta_1 &\sim N(m_1, s_1^2)\\ \sigma &\sim \text{Exp}(l) \end{align*} \]
The general formula is adjusted for the inclusion of regression coefficients for each day of the week \(\beta_1, \beta_2, ..., \beta_6\), where the intercept coefficient \(\beta_0\) reflects Tuesdayβs arrival delay. Only one prior for the predictor must be specified as the stan_glm() function handles categorical variables in a similar way to the glm() function, through dummy coding.
Flat Categorical Model
Code
Delays_sample$DAY_OF_WEEK <- factor(
Delays_sample$DAY_OF_WEEK,
ordered = FALSE)
Delays_sample <- Delays_sample %>%
mutate(DAY_OF_WEEK = relevel(as.factor(DAY_OF_WEEK), ref = "Tue"))
flat_model_dow <- stan_glm(
ARR_DELAY ~ DAY_OF_WEEK,
data = Delays_sample,
family = gaussian(),
prior = NULL,
prior_intercept = NULL,
prior_aux = NULL,
chains = 4, iter = 2000, seed = 123,
refresh = 0
)
model <- flat_model_dow
fmdow <- tidy(model, effects = c("fixed", "aux"),
conf.int = TRUE, conf.level = 0.95)
library(ggplot2)
library(ggridges)
library(dplyr)
#Stacked distributions
new_data <- data.frame(DAY_OF_WEEK = levels(Delays_sample$DAY_OF_WEEK))
predictions <- posterior_predict(model, newdata = new_data)
pred_df <- as.data.frame(predictions)
colnames(pred_df) <- levels(Delays_sample$DAY_OF_WEEK)
pred_long <- pred_df %>%
pivot_longer(cols = everything(), names_to = "DAY_OF_WEEK", values_to = "ARR_DELAY")
pred_long$DAY_OF_WEEK <- factor(pred_long$DAY_OF_WEEK, levels = c("Tue", "Wed", "Thu", "Fri", "Sat", "Sun", "Mon"))
ggplot(pred_long, aes(x = ARR_DELAY, y = DAY_OF_WEEK, fill = DAY_OF_WEEK)) +
geom_density_ridges(alpha = 0.7, scale = 0.8) +
stat_summary(fun = mean,
geom = "vline",
aes(xintercept = ..x.., color = DAY_OF_WEEK),
linetype = "dashed", linewidth = 0.5, show.legend = FALSE) +
labs(
title = "Figure 13 (a) Posterior Predicted Distribution of Arrival Delay by Day of the Week, Flat Priors",
x = "Predicted Arrival Delay (minutes)",
y = " ",
fill = "Day"
) +
xlim(-150,150)+
theme_minimal()Code
model <- flat_model_dow
ggarrange(mcmc_trace(model, size = 0.1) + ggtitle("Figure 13 (b) MCMC Trace") +ylab("pi"))Code
ggarrange(mcmc_dens_overlay(model) + ggtitle("Figure 13 (c) MCMC Overlay") + ylab("density") + xlab("pi"))Code
ggarrange(mcmc_acf(model) + ggtitle("Figure 13 (d) Autocorrelation"))The posterior median relationship for the second model with flat priors is
\[ 1.57 + 4.92X_1 + 2.66X_2 + 1.86X_3 + 3.43X_4 + 4.36X_5 + 2.93X_6. \]
That is, on Tuesday one can expect a delay of \(1.57\) minutes (\(1\) minute and \(34\) seconds). One can expect the longest delay on Wednesday at \(6\) minutes and \(29\) seconds.
Default Categorical Model
Code
default_model_dow <- stan_glm(
ARR_DELAY ~ DAY_OF_WEEK,
data = Delays_sample,
family = gaussian(),
chains = 4, iter = 2000, seed = 123,
refresh = 0
)
model <- default_model_dow
dmdow <- tidy(model, effects = c("fixed", "aux"),
conf.int = TRUE, conf.level = 0.95)
library(ggplot2)
library(ggridges)
library(dplyr)
#Stacked distributions
new_data <- data.frame(DAY_OF_WEEK = levels(Delays_sample$DAY_OF_WEEK))
predictions <- posterior_predict(model, newdata = new_data)
pred_df <- as.data.frame(predictions)
colnames(pred_df) <- levels(Delays_sample$DAY_OF_WEEK)
pred_long <- pred_df %>%
pivot_longer(cols = everything(), names_to = "DAY_OF_WEEK", values_to = "ARR_DELAY")
pred_long$DAY_OF_WEEK <- factor(pred_long$DAY_OF_WEEK, levels = c("Tue", "Wed", "Thu", "Fri", "Sat", "Sun", "Mon"))
ggplot(pred_long, aes(x = ARR_DELAY, y = DAY_OF_WEEK, fill = DAY_OF_WEEK)) +
geom_density_ridges(alpha = 0.7, scale = 0.8) +
stat_summary(fun = mean,
geom = "vline",
aes(xintercept = ..x.., color = DAY_OF_WEEK),
linetype = "dashed", linewidth = 0.5, show.legend = FALSE) +
labs(
title = "Figure 14 (a) Posterior Predicted Distribution of Arrival Delay by Day of the Week, Default Priors",
x = "Predicted Arrival Delay (minutes)",
y = " ",
fill = "Day"
) +
xlim(-150,150)+
theme_minimal()Code
model <- default_model_dow
ggarrange(mcmc_trace(model, size = 0.1) + ggtitle("Figure 14 (b) MCMC Trace") +ylab("pi"))Code
ggarrange(mcmc_dens_overlay(model) + ggtitle("Figure 14 (c) MCMC Overlay") + ylab("density") + xlab("pi"))Code
ggarrange(mcmc_acf(model) + ggtitle("Figure 14 (d) Autocorrelation"))The posterior median relationship for the second model with default priors is
\[ 1.54 + 4.40X_1 + 2.69X_2 + 2.97X_3 + 4.96X_4 + 3.48X_5 + 1.89X_6. \]
This version of the second model showed little change for the expectations of Tuesday through Thursday, but the rest of the week had shifts of more than a minute for all day except Sunday with a change of \(53\) seconds.
Tuned Categorical Model
For arrival delays by the day of the week, the Figure 9 shows mean arrival delays are between 1 and 7 minutes while the median arrival delays are all in the negative, indicating a skew towards larger delays.
\(\beta_0\) informs the model intercept
Code
mean_delay_by_day\(\beta_{0}\) reflects the mean arrival delay on Tuesday, our reference. The average arrival delay is \(\sim\) 2 minutes with a standard deviation \(\sim\) 46 minutes.
\[ \beta_{0} \sim N(2, 46^2) \]
\(\beta_j\) informs the model slopes
For a categorical predictor with the stan_glm() function, the tuned prior, \(\beta_j\), is applied to to the estimation of each coefficient associated with the individual levels of the predictor ($_1, _2, β¦, _6 $). For this reason, we set the coefficient prior to be weakly informative.
\[ \beta_{j} \sim N(0, 50^2) \]
\(\sigma\) informs the regression standard deviation
Code
lm_model <- lm(ARR_DELAY ~ DAY_OF_WEEK, data = Delays_sample)
summary(lm_model)$sigmaTo tune the exponential model, we set the expected value of the standard deviation, $ E() $, equal to the residual standard error which is the same as with the previous model, \(\sim 50\).
\[ \begin{align*} E(\sigma) &= \frac{1}{l} = 50\\\\ l &= \frac{1}{50} = 0.02\\\\ \sigma &\sim \text{Exp}(0.02) \end{align*} \]
The tuned model is as follows,
\[ \begin{align*} Y_i|\beta_0, \beta_1, ... \beta_6, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... \beta_6X_{i6} \\ \beta_{0} &\sim N(2, 46^2)\\ \beta_j &\sim N(0, 50^2)\\ \sigma &\sim \text{Exp}(0.02) \end{align*} \]
Code
tuned_model_dow <- stan_glm(
ARR_DELAY ~ DAY_OF_WEEK,
data = Delays_sample,
family = gaussian(),
prior = normal(2,2116),
prior_intercept = normal(0,2500),
prior_aux = exponential(0.02),
chains = 4, iter = 2000, seed = 123,
refresh = 0
)
model <- tuned_model_dow
tmdow <- tidy(model, effects = c("fixed", "aux"),
conf.int = TRUE, conf.level = 0.95)
library(ggplot2)
library(ggridges)
library(dplyr)
#Stacked distributions
new_data <- data.frame(DAY_OF_WEEK = levels(Delays_sample$DAY_OF_WEEK))
predictions <- posterior_predict(model, newdata = new_data)
pred_df <- as.data.frame(predictions)
colnames(pred_df) <- levels(Delays_sample$DAY_OF_WEEK)
pred_long <- pred_df %>%
pivot_longer(cols = everything(), names_to = "DAY_OF_WEEK", values_to = "ARR_DELAY")
pred_long$DAY_OF_WEEK <- factor(pred_long$DAY_OF_WEEK, levels = c("Tue", "Wed", "Thu", "Fri", "Sat", "Sun", "Mon"))
ggplot(pred_long, aes(x = ARR_DELAY, y = DAY_OF_WEEK, fill = DAY_OF_WEEK)) +
geom_density_ridges(alpha = 0.7, scale = 0.8) +
stat_summary(fun = mean,
geom = "vline",
aes(xintercept = ..x.., color = DAY_OF_WEEK),
linetype = "dashed", linewidth = 0.5, show.legend = FALSE) +
labs(
title = "Figure 15 (a) Posterior Predicted Distribution of Arrival Delay by Day of the Week, Tuned Priors",
x = "Predicted Arrival Delay (minutes)",
y = " ",
fill = "Day"
) +
xlim(-150,150)+
theme_minimal()Code
model <- tuned_model_dow
ggarrange(mcmc_trace(model, size = 0.1) + ggtitle("Figure 15 (b) MCMC Trace") + ylab("pi"))Code
ggarrange(mcmc_dens_overlay(model) + ggtitle("Figure 15 (c) MCMC Overlay") + ylab("density") + xlab("pi"))Code
ggarrange(mcmc_acf(model) + ggtitle("Figure 15 (d) Autocorrelation"))The posterior median relationship for the second model with manually tuned priors is
\[ 1.54 + 4.96X_1 + 2.70X_2 + 1.91X_3 + 3.50X_4 + 4.41X_5 + 2.99X_6. \]
The tuned version of the second model showed coefficients more similar to that of the flat model.
3.5 Results
| Table 2. Estimations of the Posterior Distributionsβ Regression Coefficients. | ||||
|---|---|---|---|---|
| Mean | SD | 95% CI | ||
| Model 1: Continuous Predictor | ||||
| Flat Priors | π½β intercept | -10.92 | 0.47 | (-11.85; 0.02) |
| π½β Departure Time | 0.02 | 0.00 | (0.02; 50.86) | |
| π | 51.09 | 0.12 | (50.86; -11.86) | |
| Default Tuned Priors | π½βΒ Β intercept | -10.94 | 0.47 | (-11.86; 0.02) |
| π½β Departure Time | 0.02 | 0.00 | (0.02; 50.86) | |
| π | 51.09 | 0.12 | (50.86; -12.02) | |
| Tuned Priors | π½β intercept | -11.66 | 0.17 | (-12.02; 0.02) |
| π½β Departure Time | 0.02 | 0.00 | (0.02; 50.87) | |
| π | 51.10 | 0.12 | (50.87; 0.00) | |
| Model 2: Categorical Predictor | ||||
| Flat Priors | π½β intercept | 1.57 | 0.44 | (0.69; 3.72) |
| π½β Wednesday | 4.92 | 0.61 | (3.72; 1.47) | |
| π½β Thursday | 2.66 | 0.65 | (1.47; 0.69) | |
| π½β Friday | 1.86 | 0.62 | (0.69; 2.25) | |
| π½β Saturday | 3.43 | 0.61 | (2.25; 3.21) | |
| π½β Sunday | 4.36 | 0.61 | (3.21; 1.72) | |
| π½β Monday | 2.93 | 0.65 | (1.72; 51.16) | |
| π | 51.39 | 0.12 | (51.16; 0.73) | |
| Default Tuned Priors | π½β intercept | 1.54 | 0.40 | (0.73; 3.21) |
| π½β Wednesday | 4.40 | 0.60 | (3.21; 1.48) | |
| π½β Thursday | 2.69 | 0.60 | (1.48; 1.75) | |
| π½β Friday | 2.97 | 0.64 | (1.75; 3.77) | |
| π½β Saturday | 4.96 | 0.59 | (3.77; 2.31) | |
| π½β Sunday | 3.48 | 0.58 | (2.31; 0.74) | |
| π½β Monday | 1.89 | 0.60 | (0.74; 51.17) | |
| π | 51.40 | 0.12 | (51.17; 0.66) | |
| Tuned Priors | π½β intercept | 1.54 | 0.44 | (0.66; 3.76) |
| π½β Wednesday | 4.96 | 0.64 | (3.76; 1.48) | |
| π½β Thursday | 2.70 | 0.61 | (1.48; 0.71) | |
| π½β Friday | 1.91 | 0.64 | (0.71; 2.28) | |
| π½β Saturday | 3.50 | 0.63 | (2.28; 3.23) | |
| π½β Sunday | 4.41 | 0.61 | (3.23; 1.73) | |
| π½β Monday | 2.99 | 0.64 | (1.73; 51.17) | |
| π | 51.39 | 0.12 | (51.17; 0.00) | |
4 Discussion
4.1 Comparison of the Models
Cross validation was done to provide an estimation of model performance. The provided posterior predictive summary is comprised of the median absolute error (MAE) which measures the typical difference between observed and posterior predictive means, the scaled MAE (scaled_MAE) which scales MAE by standard deviations, and the proportion of observations that fall within the \(50\%\) and \(95\%\) posterior prediction intervals. The cross validation assessment (prediction_summary_cv()) was chosen over the posterior prediction summary assessment (prediction_summary()), both from the βbayesrulesβ package(Dogucu, Johnson, and Ott 2021), as the cross validation procedure provides a more conservative estimate of model performance. Cross validation can also be more computationally efficient with a lower number of folds and the use of a smaller dataset.
Code
library(bayesrules)
# #Posterior Predictive Summary
#
# set.seed(123)
# ps_fmdt <- prediction_summary(flat_model_dt, data = Delays_sample)
#
# set.seed(123)
# ps_dmdt <- prediction_summary(default_model_dt, data = Delays_sample)
#
# set.seed(123)
# ps_tmdt <- prediction_summary(tuned_model_dt, data = Delays_sample)
#
# ps_fmdt <- ps_fmdt %>%
# mutate(Model = "fmdt")
# ps_dmdt <- ps_dmdt %>%
# mutate(Model = "dmdt")
# ps_tmdt <- ps_tmdt %>%
# mutate(Model = "tmdt")
#
# ps_results <- bind_rows(ps_fmdt, ps_dmdt, ps_tmdt)
#
# print(ps_results)
#K-fold Cross Validation with k=5, data n=1000
set.seed(123)
sample_size <- 1000
Delays_sample2 <- Delays_sample %>%
sample_n(sample_size)
set.seed(123)
cv_fmdt <- prediction_summary_cv(
model = flat_model_dt, data = Delays_sample2, k = 5)
set.seed(123)
cv_dmdt <- prediction_summary_cv(
model = default_model_dt, data = Delays_sample2, k = 5)
set.seed(123)
cv_tmdt <- prediction_summary_cv(
model = tuned_model_dt, data = Delays_sample2, k = 5)
cv_fmdt <- cv_fmdt$cv %>%
mutate(Model = "fmdt")
cv_dmdt <- cv_dmdt$cv %>%
mutate(Model = "dmdt")
cv_tmdt <- cv_tmdt$cv %>%
mutate(Model = "tmdt")
cv_results <- bind_rows(cv_fmdt, cv_dmdt, cv_tmdt)
print(cv_results)Code
#Posterior Predictive Summary
#
# set.seed(123)
# ps_fmdow <- prediction_summary(flat_model_dow, data = Delays_sample)
#
# set.seed(123)
# ps_dmdow <- prediction_summary(default_model_dow, data = Delays_sample)
#
# set.seed(123)
# ps_tmdow <- prediction_summary(tuned_model_dow, data = Delays_sample)
#
# ps_fmdow <- ps_fmdow %>%
# mutate(Model = "fmdow")
# ps_dmdow <- ps_dmdow %>%
# mutate(Model = "dmdow")
# ps_tmdow <- ps_tmdow %>%
# mutate(Model = "tmdow")
#
# ps_results2 <- bind_rows(ps_fmdow, ps_dmdow, ps_tmdow)
#
# print(ps_results2)
#K-fold Cross Validation with k=5, data n=1000
set.seed(123)
sample_size <- 1000
Delays_sample2 <- Delays_sample %>%
sample_n(sample_size)
set.seed(123)
cv_fmdow <- prediction_summary_cv(
model = flat_model_dow, data = Delays_sample2, k = 5)
set.seed(123)
cv_dmdow <- prediction_summary_cv(
model = default_model_dow, data = Delays_sample2, k = 5)
set.seed(123)
cv_tmdow <- prediction_summary_cv(
model = tuned_model_dow, data = Delays_sample2, k = 5)
cv_fmdow <- cv_fmdow$cv %>%
mutate(Model = "fmdow")
cv_dmdow <- cv_dmdow$cv %>%
mutate(Model = "dmdow")
cv_tmdow <- cv_tmdow$cv %>%
mutate(Model = "tmdow")
cv_results2 <- bind_rows(cv_fmdow, cv_dmdow, cv_tmdow)
print(cv_results2)| Table 3. Posterior predictive results from cross validation. | ||||
|---|---|---|---|---|
| MAE | MAE Scaled | Within 50% | Within 95% | |
| Β Β Β Β Β Model 1: Continuous Predictor | ||||
| Flat Priors | 15.730 | 0.313 | 0.841 | 0.966 |
| Default Tuned Priors | 15.779 | 0.314 | 0.840 | 0.966 |
| Tuned Priors | 15.668 | 0.312 | 0.849 | 0.966 |
| Β Β Β Β Β Model 2: Categorical Predictor | ||||
| Flat Priors | 17.110 | 0.338 | 0.866 | 0.965 |
| Default Tuned Priors | 17.080 | 0.338 | 0.866 | 0.966 |
| Tuned Priors | 17.118 | 0.339 | 0.867 | 0.966 |
Code
pp_check(flat_model_dt, nreps = 50) + ggtitle("Posterior Predictive Check: fmdt")Code
pp_check(default_model_dt, nreps = 50) + ggtitle("Posterior Predictive Check: dmdt")Code
pp_check(tuned_model_dt, nreps = 50) + ggtitle("Posterior Predictive Check: tmdt")Code
pp_check(flat_model_dow, nreps = 50) + ggtitle("Posterior Predictive Check: fmdow")Code
pp_check(default_model_dow, nreps = 50) + ggtitle("Posterior Predictive Check: dmdow")Code
pp_check(tuned_model_dow, nreps = 50) + ggtitle("Posterior Predictive Check: tmdow")Code
fmdt_neff <- neff_ratio(flat_model_dt)
dmdt_neff <- neff_ratio(default_model_dt)
tmdt_neff <- neff_ratio(tuned_model_dt)
fmdow_neff <- neff_ratio(flat_model_dow)
dmdow_neff <- neff_ratio(default_model_dow)
tmdow_neff <- neff_ratio(tuned_model_dow)
fmdt_neff <- as_tibble_row(fmdt_neff) %>%
mutate(Model = "fmdt")
dmdt_neff <- as_tibble_row(dmdt_neff) %>%
mutate(Model = "dmdt")
tmdt_neff <- as_tibble_row(tmdt_neff) %>%
mutate(Model = "tmdt")
fmdow_neff <- as_tibble_row(fmdow_neff) %>%
mutate(Model = "fmdow")
dmdow_neff <- as_tibble_row(dmdow_neff) %>%
mutate(Model = "dmdow")
tmdow_neff <- as_tibble_row(tmdow_neff) %>%
mutate(Model = "tmdow")
neff_table_m1 <- bind_rows(fmdt_neff, dmdt_neff, tmdt_neff)
neff_table_m2 <- bind_rows(fmdow_neff, dmdow_neff, tmdow_neff)
neff_table_m1
neff_table_m2| Table 4. Effective sample size ratios for Model 1. | ||
|---|---|---|
| Flat Priors | π½β intercept | 0.83 |
| π½β Departure Time | 1.19 | |
| π | 0.48 | |
| Default Tuned Priors | π½βΒ Β intercept | 0.80 |
| π½β Departure Time | 1.10 | |
| π | 0.80 | |
| Tuned Priors | π½β intercept | 0.64 |
| π½β Departure Time | 0.71 | |
| π | 1.04 | |
| Table 5. Effective sample size ratios for Model 2. | ||
|---|---|---|
| Flat Priors | π½β intercept | 0.27 |
| π½β Wednesday | 0.36 | |
| π½β Thursday | 0.38 | |
| π½β Friday | 0.36 | |
| π½β Saturday | 0.37 | |
| π½β Sunday | 0.35 | |
| π½β Monday | 0.36 | |
| π | 2.92 | |
| Default Tuned Priors | π½β intercept | 0.37 |
| π½β Wednesday | 0.60 | |
| π½β Thursday | 0.61 | |
| π½β Friday | 0.56 | |
| π½β Saturday | 0.58 | |
| π½β Sunday | 0.59 | |
| π½β Monday | 0.55 | |
| π | 0.66 | |
| Tuned Priors | π½β intercept | 0.47 |
| π½β Wednesday | 0.97 | |
| π½β Thursday | 0.89 | |
| π½β Friday | 1.00 | |
| π½β Saturday | 0.95 | |
| π½β Sunday | 0.97 | |
| π½β Monday | 0.93 | |
| π | 0.21 | |
Code
fmdt_rhat <- rhat(flat_model_dt)
dmdt_rhat <- rhat(default_model_dt)
tmdt_rhat <- rhat(tuned_model_dt)
fmdow_rhat <- rhat(flat_model_dow)
dmdow_rhat <- rhat(default_model_dow)
tmdow_rhat <- rhat(tuned_model_dow)
fmdt_rhat <- as_tibble_row(fmdt_rhat) %>%
mutate(Model = "fmdt")
dmdt_rhat <- as_tibble_row(dmdt_rhat) %>%
mutate(Model = "dmdt")
tmdt_rhat <- as_tibble_row(tmdt_rhat) %>%
mutate(Model = "tmdt")
fmdow_rhat <- as_tibble_row(fmdow_rhat) %>%
mutate(Model = "fmdow")
dmdow_rhat <- as_tibble_row(dmdow_rhat) %>%
mutate(Model = "dmdow")
tmdow_rhat <- as_tibble_row(tmdow_rhat) %>%
mutate(Model = "tmdow")
rhat_table_m1 <- bind_rows(fmdt_rhat, dmdt_rhat, tmdt_rhat)
rhat_table_m2 <- bind_rows(fmdow_rhat, dmdow_rhat, tmdow_rhat)
rhat_table_m1
rhat_table_m2| Table 6. R-hat metric for Model 1. | ||
|---|---|---|
| Flat Priors | π½β intercept | 0.9997 |
| π½β Departure Time | 0.9996 | |
| π | 1.0015 | |
| Default Tuned Priors | π½βΒ Β intercept | 1.0008 |
| π½β Departure Time | 1.0005 | |
| π | 1.0004 | |
| Tuned Priors | π½β intercept | 1.0004 |
| π½β Departure Time | 0.9996 | |
| π | 1.0004 | |
| Table 7. R-hat metric for Model 2. | ||
|---|---|---|
| Flat Priors | π½β intercept | 1.0045 |
| π½β Wednesday | 1.0027 | |
| π½β Thursday | 1.0027 | |
| π½β Friday | 1.0028 | |
| π½β Saturday | 1.0012 | |
| π½β Sunday | 1.0036 | |
| π½β Monday | 1.0024 | |
| π | 0.9994 | |
| Default Tuned Priors | π½β intercept | 1.0015 |
| π½β Wednesday | 1.0001 | |
| π½β Thursday | 1.0015 | |
| π½β Friday | 0.9996 | |
| π½β Saturday | 1.0010 | |
| π½β Sunday | 0.9995 | |
| π½β Monday | 1.0002 | |
| π | 0.9995 | |
| Tuned Priors | π½β intercept | 1.0003 |
| π½β Wednesday | 0.9995 | |
| π½β Thursday | 1.0002 | |
| π½β Friday | 0.9997 | |
| π½β Saturday | 0.9998 | |
| π½β Sunday | 1.0008 | |
| π½β Monday | 0.9996 | |
| π | 1.0056 | |
4.2 Evaluation of the Models
All notes past this point are for future reference
Consideration the Dataset
- how was data collected, by who and for what purpose, whaty impact might the results have, what biases are baked in to the analyses (scope of the data)
Checking Model Assumptions
βNormal regression assumptions
The appropriateness of the Bayesian Normal regression model (9.3) depends upon the following assumptions.
- Structure of the data
- Accounting for predictor \(X\), the observed data \(Y_i\) on case \(i\) is independent of the observed data on any other case \(j\).
- Structure of the relationship
- The typical \(Y\) outcome can be written as a linear function of predictor X, \(ΞΌ = Ξ²_0 + Ξ²_1X\).
- Structure of the variability
- At any value of predictor \(X\), the observed values of \(Y\) will vary normally around their average \(ΞΌ\) with consistent standard deviation \(Ο\).β
Assumption 2 & 3: scatterplot of raw data, pp_check
# Examine 50 of the 20000 simulated samples
pp_check(bike_model, nreps = 50) +
xlab("rides")
Accuracy of Posterior Predictive Models
discussion on MCMC simulations -> trace, overlay, acf
Neff > (0.1)*N neff_ratio()
5 Conclusion
Summarize your key findings.
Discuss the implications of your results.